IDa-Det: An Information Discrepancy-Aware Distillation for 1-bit Detectors
177
(a) Effect of μ.
(b) Effect of λ and γ.
FIGURE 6.17
On VOC, we (a) select μ on the raw detector and different KD methods including Hint [33],
FGFI [235], and IDa-Det; (b) select λ and γ on IDa-Det with μ set as 1e−4.
by 2.5%, 2.4%, and 1.8% compared to non-distillation, Hint and FGFI, under the same
student-teacher framework. Then we evaluate the proposed entropy distillation loss against
the conventional ℓ2 loss, the loss of the inner product and the loss of cosine similarity. As
depicted in Table 6.5, our entropy distillation loss improves the distillation performance by
0.4%, 0.3%, and 0.4% with the Hint, FGFI, and IDa method compared with ℓ2 loss. Com-
pared to the loss of the inner product and cosine similarity, the loss of entropy outperforms
them by 2.1% and 0.5% in mAP in our framework, which further reflects the effectiveness
of our method.
TABLE 6.5
The effects of different components in IDa-Det with Faster-RCNN
model on PASCAL VOC dataset.
Model
Proposal selection
Distillation method
mAP
Res18
78.6
BiRes18
74.0
Res101-BiRes18
Hint
ℓ2
74.1
Res101-BiRes18
Hint
Entropy loss
74.5
Res101-BiRes18
FGFI
ℓ2
74.7
Res101-BiRes18
FGFI
Entropy loss
75.0
Res101-BiRes18
IDa
Inner-product
74.8
Res101-BiRes18
IDa
Cosine similarity
76.4
Res101-BiRes18
IDa
ℓ2
76.5
Res101-BiRes18
IDa
Entropy loss
76.9
Note: Hint [33] and FGFI[235] are used to compare with our information discrepancy-aware
proposal selection (IDa). IDa and Entropy loss denote main components of the proposed
IDa-Det.